Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38319761

RESUMO

Safe reinforcement learning (RL) has shown great potential for building safe general-purpose robotic systems. While many existing works have focused on post-training policy safety, it remains an open problem to ensure safety during training as well as to improve exploration efficiency. Motivated to address these challenges, this work develops shielded planning guided policy optimization (SPPO), a new model-based safe RL method that augments policy optimization algorithms with path planning and shielding mechanism. In particular, SPPO is equipped with shielded planning for guided exploration and efficient data collection via model predictive path integral (MPPI), along with an advantage-based shielding rule to keep the above processes safe. Based on the collected safe data, a task-oriented parameter optimization (TOPO) method is used for policy improvement, as well as the observation-independent latent dynamics enhancement. In addition, SPPO provides explicit theoretical guarantees, i.e., clear theoretical bounds for training safety, deployment safety, and the learned policy performance. Experiments demonstrate that SPPO outperforms baselines in terms of policy performance, learning efficiency, and safety performance during training.

2.
IEEE Trans Cybern ; PP2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38163300

RESUMO

Safety as a fundamental requirement for human-swarm interaction has attracted a lot of attention in recent years. Most existing approaches solve a constrained optimization problem at each time step, which has a high real-time requirement. To deal with this challenge, this article formulates the safe human-swarm interaction problem as a Stackerberg-Nash game, in which the optimization is performed over the entire time domain. The leader robot is supposed to be in a dominant position, interacting directly with the human operator to realize trajectory tracking and responsible for guiding the swarm to avoid obstacles. The follower robots always take their best responses to leader's behavior with the purpose of achieving the desired formation. Following the bottom-up principle, we first design the best-response controllers, that is, Nash equilibrium strategies, for the followers. Then, a Lyapunov-like control barrier function-based safety controller and a learning-based formation tracking controller for the leader are designed to realize safe and robust cooperation. We show that the designed controllers can make the robotic swarms move in a desired geometric formation following the human command and modify their motion trajectories autonomously when the human command is unsafe. The effectiveness of the proposed approach is verified through simulation and experiments. The experiment results further show that safety can still be guaranteed even when there exists a dynamic obstacle.

3.
IEEE Trans Cybern ; 53(8): 5000-5012, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37030690

RESUMO

This article is concerned with the output feedback security control of a class of high-order nonlinear-interconnected systems with denial-of-service (DoS) attacks, nonlinear dynamics, and exogenous disturbances. First, extreme learning machine (ELM) and adaptive techniques are adopted to approximate the unknown nonlinearities. Then, novel adaptive ELM-based nonlinear state observers with adaptive compensation functions are developed to estimate the unmeasurable states during DoS attacks under the influence of the disturbances. Further, by combining with the backstepping control and filtering techniques, adaptive ELM-based controllers are proposed to achieve uniformly ultimately bounded results based on the observation and adaption control signals under the influence of DoS attacks, nonlinear dynamics, and exogenous disturbances. Comparative studies are carried out to validate the effectiveness of the developed ELM-based adaptive observation and control strategies for two interconnected power systems.

4.
IEEE Trans Cybern ; 53(3): 1587-1597, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34478395

RESUMO

In this article, two novel distributed variational Bayesian (VB) algorithms for a general class of conjugate-exponential models are proposed over synchronous and asynchronous sensor networks. First, we design a penalty-based distributed VB (PB-DVB) algorithm for synchronous networks, where a penalty function based on the Kullback-Leibler (KL) divergence is introduced to penalize the difference of posterior distributions between nodes. Then, a token-passing-based distributed VB (TPB-DVB) algorithm is developed for asynchronous networks by borrowing the token-passing approach and the stochastic variational inference. Finally, applications of the proposed algorithm on the Gaussian mixture model (GMM) are exhibited. Simulation results show that the PB-DVB algorithm has good performance in the aspects of estimation/inference ability, robustness against initialization, and convergence speed, and the TPB-DVB algorithm is superior to existing token-passing-based distributed clustering algorithms.

5.
IEEE Trans Cybern ; 53(4): 2087-2096, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34543217

RESUMO

This article is centered on the cybersecurity research of dynamic state estimation for power systems with measurement delays. Relying on mixed measurements from phasor measurement units (PMUs) and remote terminal units (RTUs), a delayed measurement model is constructed. A modified state estimator based on the Kalman filter (KF) is designed, which can obtain the optimal estimated states under measurement delays. Moreover, the measurement data transmitted from the sensor to the estimator are vulnerable to cyberattacks. Especially, false data-injection (FDI) attacks are frequently encountered in the power system state estimation (PSSE) process. In the case of measurement delays, an FDI attack strategy is designed to interfere with the state estimator and evade detection by the chi-square detector. By utilizing the attacked estimated information and the uncorrupted measurement information, two measurement residual vectors are designed. According to these two residual vectors, a chi-square-based attack detection method is proposed, which has the ability to detect the attack without being affected by the delayed measurements. The proposed KF algorithm and attack detection method are implemented on an IEEE 14-bus system and they are confirmed to be effective and feasible.

6.
IEEE Trans Neural Netw Learn Syst ; 33(5): 1914-1924, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-33064652

RESUMO

Inspired by the collective decision making in biological systems, such as honeybee swarm searching for a new colony, we study a dynamic collective choice problem for large-population systems with the purpose of realizing certain advantageous features observed in biology. This problem focuses on the situation where a large number of heterogeneous agents subject to adversarial disturbances move from initial positions toward one of the destinations in a finite time while trying to remain close to the average trajectory of all agents. To overcome the complexity of this problem resulting from the large population and the heterogeneity of agents, and also to enforce some specific choices by individuals, we formulate the problem under consideration as a robust mean-field game with non-convex and non-smooth cost functions. Through Nash equivalence principle, we first deal with a single-player H∞ tracking problem by taking the population behavior as a fixed trajectory, and then establish a mean-field system to estimate the population behavior. Optimal control strategies and worst disturbances, independent of the population size, are designed, which give a way to realize the collective decision-making behavior emerged in biological systems. We further prove that the designed strategies constitute ϵN -Nash equilibrium, where ϵN goes toward zero as the number of agents increases to infinity. The effectiveness of the proposed results are illustrated through two simulation examples.


Assuntos
Hepatopatia Gordurosa não Alcoólica , Algoritmos , Animais , Abelhas , Simulação por Computador , Humanos , Redes Neurais de Computação
7.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1429-1440, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33351765

RESUMO

In this article, we study a multiplayer Stackelberg-Nash game (SNG) pertaining to a nonlinear dynamical system, including one leader and multiple followers. At the higher level, the leader makes its decision preferentially with consideration of the reaction functions of all followers, while, at the lower level, each of the followers reacts optimally to the leader's strategy simultaneously by playing a Nash game. First, the optimal strategies for the leader and the followers are derived from down to the top, and these strategies are further shown to constitute the Stackelberg-Nash equilibrium points. Subsequently, to overcome the difficulty in calculating the equilibrium points analytically, we develop a novel two-level value iteration-based integral reinforcement learning (VI-IRL) algorithm that relies only upon partial information of system dynamics. We establish that the proposed method converges asymptotically to the equilibrium strategies under the weak coupling conditions. Moreover, we introduce effective termination criteria to guarantee the admissibility of the policy (strategy) profile obtained from a finite number of iterations of the proposed algorithm. In the implementation of our scheme, we employ neural networks (NNs) to approximate the value functions and invoke the least-squares methods to update the involved weights. Finally, the effectiveness of the developed algorithm is verified by two simulation examples.

8.
IEEE Trans Cybern ; 52(3): 1565-1574, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32459623

RESUMO

In this article, the consensus problem of linear systems is revisited from a novel geometric perspective. The interaction network of these systems is assumed to be piecewise fixed. Moreover, it is allowed to be disconnected at any time but holds a quite mild joint connectivity property. The system matrix is marginally stable and the input matrix is not of full-row rank. By directly examining the subspace determined by the network, we first establish convergence by resorting to an observability condition. Then, according to joint connectivity, we are able to extend this convergence uniformly to the entire orthogonal complement of the consensus manifold. In this way, we work out the necessary and sufficient condition for exponential consensus. It turns out that, with a suitably designed feedback matrix, exponential consensus can be realized globally and uniformly if and only if a jointly (δ,T) -connected condition and an observability condition relying only on the system and input matrices are satisfied. We also characterize the lower bound of the convergence rate. Simple yet effective examples are presented to illustrate the findings.

9.
IEEE Trans Cybern ; 52(2): 1061-1072, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32471806

RESUMO

This article studies the containment control problem for a group of linear systems, consisting of more than one leader, over switching topologies. The input matrices of these linear systems are not required to have full-row rank and the switching can be arbitrary, making the problem quite general and challenging. We propose a novel analysis framework from the viewpoint of a state transition matrix. Specifically, according to the inherent linearity, we successfully establish a connection between state transition matrices of the above multileader system and a virtual leader-following system obtained by combining those leaders. This enlightening result relates the containment problem to a consensus one. Then, by analyzing the property of the state transition matrix, we uncover that each component of any follower's state converges to the convex hull spanned by the corresponding components of the leaders', provided some mild conditions are satisfied. These conditions are derived in terms of the concept of a positive linear system. A special case of the second-order linear system is further discussed to illustrate these conditions. Moreover, two different design methods of the feedback gain matrix are provided, which additionally require that the network topology contains a united spanning tree all the time.

10.
IEEE Trans Cybern ; 51(2): 994-1003, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31107677

RESUMO

In this paper, we investigate the issue of security on the remote state estimation in cyber-physical systems (CPSs), where a wireless sensor utilizes the channel hopping scheme to transmit the data to the remote estimator over multiple channels in the presence of periodic denial-of-service attacks. Assume that the jammer can interfere with a subset of channels at each attack time in active period. For an energy-constraint jammer, the problem of how to select the number of channels at each attack time to maximally deteriorate the CPS performance is investigated. Based on the index of average estimation error, we introduce two different attack strategies, which include selecting identical number of channels and unequal number of channels at each attack time, and further show theoretically that the attack effect by selecting unequal number of channels is better than that of selecting identical number of channels. By formulating the problem of selecting the number of channels as integer programming problems, we present the corresponding algorithm to approximate the optimal attack schedule for both cases. The numerical results are presented to validate the theoretical results and the effectiveness of the proposed algorithms.

11.
IEEE Trans Neural Netw Learn Syst ; 32(4): 1600-1611, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32340962

RESUMO

Considering the fact that in the real world, a certain agent may have some sort of advantage to act before others, a novel hierarchical optimal synchronization problem for linear systems, composed of one major agent and multiple minor agents, is formulated and studied in this article from a Stackelberg-Nash game perspective. The major agent herein makes its decision prior to others, and then, all the minor agents determine their actions simultaneously. To seek the optimal controllers, the Hamilton-Jacobi-Bellman (HJB) equations in coupled forms are established, whose solutions are further proven to be stable and constitute the Stackelberg-Nash equilibrium. Due to the introduction of the asymmetric roles for agents, the established HJB equations are more strongly coupled and more difficult to solve than that given in most existing works. Therefore, we propose a new reinforcement learning (RL) algorithm, i.e., a two-level value iteration (VI) algorithm, which does not rely on complete system matrices. Furthermore, the proposed algorithm is shown to be convergent, and the converged values are exactly the optimal ones. To implement this VI algorithm, neural networks (NNs) are employed to approximate the value functions, and the gradient descent method is used to update the weights of NNs. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.

12.
IEEE Trans Cybern ; 50(9): 4146-4156, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31251206

RESUMO

Economic dispatch (ED) and unit commitment (UC) problems need to be revisited in order to make a transition from a traditional power system to a smart grid. In this paper, we formulate the ED and UC problems into a unified form, which is also capable of characterizing the infinite horizon UC problem. Based on the formulation, a centralized Q -learning-based optimization algorithm is proposed. The proposed algorithm runs in an online manner and requires no prior information on the mathematical formulation of the actual cost functions, thus being capable of dealing with situations for which such cost functions are too difficult to obtain. Then, the distributed counterpart of the centralized algorithm is developed by relaxing the demand for global information and balancing exploration and exploitation cooperatively in a distributed way. Theoretical analysis of the proposed algorithms is also provided. Finally, several case studies are presented to demonstrate the effectiveness of the proposed algorithms.

13.
IEEE Trans Neural Netw Learn Syst ; 30(12): 3633-3644, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30946680

RESUMO

In this paper, the consensus problem is investigated for a class of nonaffine nonlinear multiagent systems (MASs) with actuator faults of partial loss of effectiveness fault and biased fault. To deal with the control difficulty caused by the nonaffine dynamics, a neural network (NN)-based adaptive consensus protocol is developed based on the Lyapunov analysis. The neuron input of the NN uses both the state information and the consensus error information. In addition, the negative feedback term of the NN weight update law is multiplied by an absolute value of the consensus error, which is helpful in improving the consensus accuracy. With the developed adaptive NN consensus protocol, semiglobal consensus with a bounded residual consensus error of the MAS is achieved, and the bounded NN weight matrix is guaranteed. Finally, simulation results show that the developed adaptive NN consensus protocol has advantages of fast convergence rate and good consensus accuracy and has the capability of rapid response with respect to the actuator faults.

14.
IEEE Trans Cybern ; 49(5): 1605-1615, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-29993675

RESUMO

This paper investigates the consensus tracking problem of second-order nonlinear multiagent systems (MAS) with disturbance and actuator fault by the sliding mode control method. The communication topology of the MAS is directed and only part of the followers have access to the leader's information. First, a discontinuous sliding mode tracking protocol is studied for consensus tracking of the MAS. Second, to address the shortcoming of chattering and difficulty of setting the control gain in the discontinuous protocol, a continuous sliding mode tracking protocol with an adaptive mechanism is developed. The adaptive mechanism will adjust the gain of the control automatically and enable the tracking protocol to work well without prior knowledge of the MAS. Third, the performance of the adaptive sliding mode protocol for consensus tracking of the MAS in the presence of actuator faults of biased fault and partial loss of effectiveness fault is further investigated. Finally, numerical simulations are performed to illustrate the efficiency of the theoretical results.

15.
IEEE Trans Neural Netw Learn Syst ; 30(1): 85-96, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29993726

RESUMO

In this paper, we aim to investigate the optimal synchronization problem for a group of generic linear systems with input saturation. To seek the optimal controller, Hamilton-Jacobi-Bellman (HJB) equations involving nonquadratic input energy terms in coupled forms are established. The solutions to these coupled HJB equations are further proven to be optimal and the induced controllers constitute interactive Nash equilibrium. Due to the difficulty to analytically solve HJB equations, especially in coupled forms, and the possible lack of model information of the systems, we apply the data-based off-policy reinforcement learning algorithm to learn the optimal control policies. A byproduct of this off-policy algorithm is shown that it is insensitive to probing noise that is exerted to the system to maintain persistence of excitation condition. In order to implement this off-policy algorithm, we employ actor and critic neural networks to approximate the controllers and the cost functions. Furthermore, the estimated control policies obtained by this presented implementation are proven to converge to the optimal ones under certain conditions. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.

16.
IEEE Trans Neural Netw Learn Syst ; 30(1): 215-224, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29994226

RESUMO

In network systems, a group of nodes may evolve into several subgroups and coordinate with each other in the same subgroup, i.e., reach cluster synchronization, to cope with the unanticipated situations. To this end, the leader-following practical cluster synchronization problem of networks of generic linear systems is studied in this paper. An event-based control algorithm that can largely reduce the amount of communication is first proposed over directed communication topologies. In the proposed algorithm, each node decides itself when to transmit its current state to its neighbors and how to update its controller according to the estimations of the states of it and its neighbors. Then, the Lyapunov method is utilized to perform the convergence analysis. It shows that the practical cluster synchronization can be ensured by choosing appropriate parameters no matter what kind of estimation for the state is applied. Furthermore, the Zeno behavior is also excluded for each node under some mild assumptions. Besides, three kinds of common estimations for the states including zero-order hold model, first-order approximate model, and high-order model-based estimations are, respectively, analyzed from the perspective of the exclusion of Zeno behavior. Finally, the validity of the proposed algorithm is demonstrated, the effects of the concerned parameters are simply presented, and the effects of the three estimations are also compared through several simulations.

17.
IEEE Trans Cybern ; 49(12): 4117-4128, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30207972

RESUMO

In this paper, we investigate the output containment control problem for a network of heterogeneous linear multiagent systems. The control target is to drive the outputs of the followers into the convex hull spanned by the leaders. To this end, we first derive a necessary condition imposed on both system dynamics and network topology from the viewpoint of internal model principle. Then, based on the necessary condition, we utilize a dynamic controller to drive the outputs of the leaders and followers to track the reference trajectories to achieve containment exponentially. We consider a general network topology which only contains a united spanning tree. Both fixed and dynamic network topologies are taken into consideration. Then, the optimal control problem for containment is further studied. An optimal control law is constructed from an algebraic Riccati equation, which is proved to be a stabilizing one as well. Finally, a reinforcement learning algorithm is introduced to solve the optimal control problem on line without the knowledge the system dynamics. Simulations are given at last to validate our theoretical findings.

18.
ISA Trans ; 80: 1-11, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29861046

RESUMO

The stabilization problem for a class of switched linear systems is investigated in the network environment. Both the synchronous and asynchronous cases are considered according to the availability of the current activated system mode to the actuator. The random communication delay is assumed to be Markovian, resulting in a sampled-data synchronous or asynchronous switched system with Markovian delay as the closed-loop system. We extend the discretization approach to deal with such sampled-data system through exploring the stability conditions of the corresponding discrete-time system. For the asynchronous case, we formulate the closed-loop system as a hybrid system with the switching between its subsystems governed by a switching signal and a Markov chain. By studying the switching number and one-step reachable mode set of the constructed vector-valued switching signal, the exponential mean-square stability (EMSS) conditions and the corresponding mode-dependent controller are obtained with a more general constraint on the designed switching signal. These results are finally verified by two illustrated numerical examples.

19.
IEEE Trans Neural Netw Learn Syst ; 29(5): 1747-1759, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-28391208

RESUMO

The cluster synchronization problem is investigated using intermittent pinning control for the interacting clusters of nonidentical nodes that may represent either general linear systems or nonlinear oscillators. These nodes communicate over general network topology, and the nodes from different clusters are governed by different self-dynamics. A unified convergence analysis is provided to analyze the synchronization via intermittent pinning controllers. It is observed that the nodes in different clusters synchronize to the given patterns if a directed spanning tree exists in the underlying topology of every extended cluster (which consists of the original cluster of nodes as well as their pinning node) and one algebraic condition holds. Structural conditions are then derived to guarantee such an algebraic condition. That is: 1) if the intracluster couplings are with sufficiently strong strength and the pinning controller is with sufficiently long execution time in every period, then the algebraic condition for general linear systems is warranted and 2) if every cluster is with the sufficiently strong intracluster coupling strength, then the pinning controller for nonlinear oscillators can have its execution time to be arbitrarily short. The lower bounds are explicitly derived both for these coupling strengths and the execution time of the pinning controller in every period. In addition, in regard to the above-mentioned structural conditions for nonlinear systems, an adaptive law is further introduced to adapt the intracluster coupling strength, such that the cluster synchronization for nonlinear systems is achieved.

20.
IEEE Trans Cybern ; 47(12): 4122-4133, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28113615

RESUMO

This paper investigates group synchronization for multiple interacting clusters of nonidentical systems that are linearly or nonlinearly coupled. By observing the structure of the coupling topology, a Lyapunov function-based approach is proposed to deal with the case of linear systems which are linearly coupled in the framework of directed topology. Such an analysis is then further extended to tackle the case of nonlinear systems in a similar framework. Moreover, the case of nonlinear systems which are nonlinearly coupled is also addressed, however, in the framework of undirected coupling topology. For all these cases, a consistent conclusion is made that group synchronization can be achieved if the coupling topology for each cluster satisfies certain connectivity condition and further, the intra-cluster coupling strengths are sufficiently strong. Both the lower bound for the intra-cluster coupling strength as well as the convergence rate are explicitly specified.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...